Combining reward shaping and hierarchies for scaling to large multiagent systems
نویسندگان
چکیده
Coordinating the actions of agents in multiagent systems presents a challenging problem, especially as the size of the system is increased and predicting the agent interactions becomes difficult. Many approaches to improving coordination within multiagent systems have been developed including organizational structures, shaped rewards, coordination graphs, heuristic methods, and learning automata. However, each of these approaches still have inherent limitations with respect to coordination and scalability. We explore the potential of synergistically combining existing coordination mechanisms such that they offset each others’ limitations. More specifically, we are interested in combining existing coordination mechanisms in order to achieve improved performance, increased scalability, and reduced coordination complexity in large multiagent systems. In this work, we discuss and demonstrate the individual limitations of two well-known coordination mechanisms. We then provide a methodology for combining the two coordination mechanisms to offset their limitations and improve performance over either method individually. Here, we combine shaped difference rewards and hierarchical organization in two domains with up to 10,000 sensing agents. We show that combining hierarchical organization with difference rewards can improve both coordination and scalability by decreasing information overhead, structuring agent-to-agent connectivity and control flow, and improving the individual decision making capabilities of agents. We show that by combining hierarchies and difference rewards, the information overheads and computational requirements of individual agents can be reduced by as much as 99% while simultaneously increasing the overall system performance within two variations of the Defect Combination Problem. Additionally, we demonstrate the robustness of this approach to handling up to 25% agent failures under various conditions.
منابع مشابه
Multiagent Learning with a Noisy Global Reward Signal
Scaling multiagent reinforcement learning to domains with many agents is a complex problem. In particular, multiagent credit assignment becomes a key issue as the system size increases. Some multiagent systems suffer from a global reward signal that is very noisy or difficult to analyze. This makes deriving a learnable local reward signal very difficult. Difference rewards (a particular instanc...
متن کاملEffects of Shaping a Reward on Multiagent Reinforcement Learning
In reinforcement learning problems, agents take sequential actions with the goal of maximizing a time-delayed reward. In this chapter, the design of reward shaping for a continuing task in a multiagent domain is investigated. We use an interesting example, keepaway soccer (Kuhlmann, 2003; Stone, 2002; Stone, 2006), in which a team tries to maintain ball possession by avoiding the opponent’s int...
متن کاملPotential-based reward shaping for knowledge-based, multi-agent reinforcement learning
Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating doma...
متن کاملPotential-based difference rewards for multiagent reinforcement learning
Difference rewards and potential-based reward shaping can both significantly improve the joint policy learnt by multiple reinforcement learning agents acting simultaneously in the same environment. Difference rewards capture an agent’s contribution to the system’s performance. Potential-based reward shaping has been proven to not alter the Nash equilibria of the system but requires domain-speci...
متن کاملDynamic potential-based reward shaping
Potential-based reward shaping can significantly improve the time needed to learn an optimal policy and, in multiagent systems, the performance of the final joint-policy. It has been proven to not alter the optimal policy of an agent learning alone or the Nash equilibria of multiple agents learning together. However, a limitation of existing proofs is the assumption that the potential of a stat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Knowledge Eng. Review
دوره 31 شماره
صفحات -
تاریخ انتشار 2016